Homework 1

Author

YOUR NAME HERE

Published

September 8, 2024

Load Packages

library(Hmisc)
library(tidyverse)

Problem 1

Survey

Sunday, September 1st at 9:03pm

Campuswire

Problem 2

Question 1

The study population consists of individuals aged 16 and older living in private households in England and Whales. The study population includes all crimes reported and investigated by the UK police.

Question 2

A survey sampling method is used, where approximately 38,000 individuals are selected to self report their experiences with crime. Administrative records of all crimes investigated by the police are used, implying a non-sampling, comprehensive data collection approach.

Question 3

The sampled population consists of individuals aged 16 and over who are not in communal living situations, and who participated in the survey. The sampled population includes all criminal incidents investigated and recorded by the UK police based on internal crime definitions.

Question 4

The target population for both data sets is the entire population of England an Wales, including all people who may experience crime.

Question 5

Data Set 1 may face reliability issues due to self-reported responses, which can vary based on individual recall or perception. Data Set 2 is likely more reliable because it involves police records, though inconsistencies may arise from varying police practices.

Data Set 1’s validity could be questioned due to subjective experiences of crime, while Data Set 2’s validity depends on how accurately police define and record crimes, potentially missing unreported crimes.

Data Set 1’s validity could be questioned due to subjective experiences of crime, while Data Set 2’s validity depends on how accurately police define and record crimes, potentially missing unreported crimes.

Problem 3

Question 1

The <- notation is equivalent to an = sign in R and is often used to declare variables. After running this code chunk, the named dataframe df appears in the environment on the right-hand side of RStudio.

df <- read_csv('https://www.openintro.org/data/csv/babies.csv')
Rows: 1236 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (8): case, bwt, gestation, parity, age, height, weight, smoke

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Question 2

The notation Hmisc:: directly calls this function from the Hmisc package. describe() is a common function name, and sometimes this is needed to indicate to R which function from which package you want to use. The pipe feature |> sends the results of the first line directly into the function on the 2nd line and is a convenient way to chain functions together.

This code prints a useful and attractive summary of the data set we are using.

Hmisc::describe(df) |> 
  html()
df Descriptives
df

8 Variables   1236 Observations

case
image
        n  missing distinct     Info     Mean      Gmd      .05      .10      .25 
     1236        0     1236        1    618.5    412.3    62.75   124.50   309.75 
      .50      .75      .90      .95 
   618.50   927.25  1112.50  1174.25  
lowest : 1 2 3 4 5 , highest: 1232 1233 1234 1235 1236
bwt
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
123601071119.620.33 88.0 97.0108.8120.0131.0142.0149.0
lowest : 55 58 62 63 65 , highest: 169 170 173 174 176
gestation
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1223131060.999279.316.57252.0262.0272.0280.0288.0295.8302.0
lowest : 148 181 204 223 224 , highest: 330 336 338 351 353
parity
nmissingdistinctInfoSumMeanGmd
1236020.573150.25490.3801

age
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
12342300.99727.266.50619202326313638
lowest : 15 17 18 19 20 , highest: 41 42 43 44 45
height
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
121422190.98664.052.83960616264666768
 Value         53    54    56    57    58    59    60    61    62    63    64    65
 Frequency      1     1     1     1    10    26    55   105   131   166   183   182
 Proportion 0.001 0.001 0.001 0.001 0.008 0.021 0.045 0.086 0.108 0.137 0.151 0.150
                                                     
 Value         66    67    68    69    70    71    72
 Frequency    153   105    54    20    13     6     1
 Proportion 0.126 0.086 0.044 0.016 0.011 0.005 0.001 
For the frequency table, variable is rounded to the nearest 0
weight
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
1200361050.999128.622.39102.0105.0114.8125.0139.0155.0170.0
lowest : 87 89 90 91 92 , highest: 215 217 220 228 250
smoke
nmissingdistinctInfoSumMeanGmd
12261020.7174840.39480.4782

Question 3

The Child Health and Development Studies investigate a range of topics. One study, in particular, considered all pregnancies between 1960 and 1967 among women in the Kaiser Foundation Health Plan in the San Francisco East Bay area. The variables in this data set are as follows.

Data Dictionary
Variable Name Variable Description Variable Type
case id number
bwt birthweight, in ounces
gestation length of gestation, in days
parity binary indicator for a first pregnancy (0 = first pregnancy)
age mother’s age in years
height mother’s height in inches
weight mother’s weight in pounds
smoke binary indicator for whether the mother smokes

Question 4

Below, 2 numeric variables were investigated for potential relationships. The independent, explanatory variable I chose is variable_name, and the dependent, response variable I chose is variable_name.

df |>
  ggplot(aes(x = gestation, # please change these
              y = weight)) + 
  geom_point()
Warning: Removed 48 rows containing missing values or values outside the scale range
(`geom_point()`).

Describe what you see in your plot here.

Session Info

This portion of the document describes the conditions in RStudio under which this report was created. This is important to include so that work is reproducible by others.

xfun::session_info()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22621)

Locale:
  LC_COLLATE=English_United States.utf8 
  LC_CTYPE=English_United States.utf8   
  LC_MONETARY=English_United States.utf8
  LC_NUMERIC=C                          
  LC_TIME=English_United States.utf8    

Package version:
  askpass_1.2.0       backports_1.5.0     base64enc_0.1-3    
  bit_4.0.5           bit64_4.0.5         blob_1.2.4         
  broom_1.0.6         bslib_0.8.0         cachem_1.1.0       
  callr_3.7.6         cellranger_1.1.0    checkmate_2.3.2    
  cli_3.6.3           clipr_0.8.0         cluster_2.1.6      
  colorspace_2.1-1    compiler_4.4.1      conflicted_1.2.0   
  cpp11_0.4.7         crayon_1.5.3        curl_5.2.1         
  data.table_1.15.4   DBI_1.2.3           dbplyr_2.5.0       
  digest_0.6.37       dplyr_1.1.4         dtplyr_1.3.1       
  evaluate_0.24.0     fansi_1.0.6         farver_2.1.2       
  fastmap_1.2.0       fontawesome_0.5.2   forcats_1.0.0      
  foreign_0.8-86      Formula_1.2-5       fs_1.6.4           
  gargle_1.5.2        generics_0.1.3      ggplot2_3.5.1      
  glue_1.7.0          googledrive_2.1.1   googlesheets4_1.1.1
  graphics_4.4.1      grDevices_4.4.1     grid_4.4.1         
  gridExtra_2.3       gtable_0.3.5        haven_2.5.4        
  highr_0.11          Hmisc_5.1-3         hms_1.1.3          
  htmlTable_2.4.3     htmltools_0.5.8.1   htmlwidgets_1.6.4  
  httr_1.4.7          ids_1.0.1           isoband_0.2.7      
  jquerylib_0.1.4     jsonlite_1.8.8      knitr_1.48         
  labeling_0.4.3      lattice_0.22.6      lifecycle_1.0.4    
  lubridate_1.9.3     magrittr_2.0.3      MASS_7.3.60.2      
  Matrix_1.7.0        memoise_2.0.1       methods_4.4.1      
  mgcv_1.9.1          mime_0.12           modelr_0.1.11      
  munsell_0.5.1       nlme_3.1.164        nnet_7.3-19        
  openssl_2.2.1       parallel_4.4.1      pillar_1.9.0       
  pkgconfig_2.0.3     prettyunits_1.2.0   processx_3.8.4     
  progress_1.2.3      ps_1.7.7            purrr_1.0.2        
  R6_2.5.1            ragg_1.3.2          rappdirs_0.3.3     
  RColorBrewer_1.1.3  readr_2.1.5         readxl_1.4.3       
  rematch_2.0.0       rematch2_2.1.2      reprex_2.1.1       
  rlang_1.1.4         rmarkdown_2.28      rpart_4.1.23       
  rstudioapi_0.16.0   rvest_1.0.4         sass_0.4.9         
  scales_1.3.0        selectr_0.4.2       splines_4.4.1      
  stats_4.4.1         stringi_1.8.4       stringr_1.5.1      
  sys_3.4.2           systemfonts_1.1.0   textshaping_0.4.0  
  tibble_3.2.1        tidyr_1.3.1         tidyselect_1.2.1   
  tidyverse_2.0.0     timechange_0.3.0    tinytex_0.52       
  tools_4.4.1         tzdb_0.4.0          utf8_1.2.4         
  utils_4.4.1         uuid_1.2.1          vctrs_0.6.5        
  viridis_0.6.5       viridisLite_0.4.2   vroom_1.6.5        
  withr_3.0.1         xfun_0.47           xml2_1.3.6         
  yaml_2.3.10